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Abstract — Following Csiszar's approach in classical informa- 
tion theory, we show that the quantum a-relative entropies with 
parameter a G (0, 1) can be represented as generalized cutoff 
rates, and hence provide a direct operational interpretation to 
the quantum a-relative entropies. We also show that various 
generalizations of the Holevo capacity, defined in terms of the 
a-relative entropies, coincide for the parameter range a G (0, 2], 
and show an upper bound on the one-shot e-capacity of a 
classical-quantum channel in terms of these capacities. 

Index Terms — Renyi relative entropies, Hoeffding distances, 
generalized cutoff rates, quantum channels, a-capacities, one- 
shot capacities. 



I. Introduction 

IN information theory, it is convenient to measure the 
distance of states (probability distributions in the classical, 
and density operators in the quantum case) with measures that 
do not satisfy the axioms of a metric. In a broad sense, a 
statistical distance is a function taking non-negative values on 
pairs of states, that satisfies some convexity properties in its 
arguments and which cannot increase when its arguments are 
subjected to a stochastic operation. Probably the most popular 
statistical distance, for a good reason, is the relative entropy 
S, defined for density operators p, <j as 



S{p\\a) 



Tr p(log p — log cr), if supp p < supp a, 
+00, otherwise. 



While various generalizations of the relative entropy, leading 
to statistical distances in the above sense, are easy to define, 
they are not equally important, and the relevant ones are those 
that appear in answers to natural statistical problems, or in 
other terms, those that admit an operational interpretation. 

The operational interpretation of the relative entropy is 
given in the problem of asymptotic binary state discrimination, 
where one is provided with several identical copies of a 
quantum system and the knowledge that the state of the system 
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is either p {null hypothesis) or a {alternative hypothesis), 
where p and a are density operators on the system's Hilbert 
space T-L, and one's goal is to make a good guess for the 
true state of the system, based on measurement results on the 
copies. It is easy to see that the most general inference scheme, 
based on measurements on n copies, can be described by a 
binary positive operator valued measurement (T, I—T), where 
T e Bin'^'"'), < T < /, and the guess is p if the outcome 
corresponding to T occurs, and cr otherwise. The probability 
of a wrong guess is Q;„(r) := Tr p®'^{I—T) if the true state is 
p {error probability of the first kind) and /3„(T) := Tr a®^T if 
the true state is a {error probability of the second kind). Unless 
the two states have orthogonal supports, there is a trade-off 
between the two error probabilities, and it is not possible to 
find a measurement that makes both error probabilities equal 
to zero. As it turns out, if we require the error probabilities 
of the first kind to go to zero asymptotically then, under an 
optimal sequence of measurements, the error probabilities of 
the second kind decay exponentially, and the decay rate is 
given by S {p\\a) [1], [2]. On the other hand, if we impose 
the stronger condition that the error probabilities of the first 
kind go to zero asymptotically as a„ ^ 2""'' for some r > 
then, under an optimal sequence of measurements, the error 
probabilities of the second kind decay as /3„ 2^"'^^^f^^"\ 
where Hr {p\\cr) is the Hoeffding distance of p and a with 
parameter r [3]-[6]. 

The Hoeffding distances can be obtained as a certain trans- 
form of the a-relative entropies that were defined by Renyi, 
based on purely axiomatic considerations [7]. While the above 
state discrimination result relates Renyi's a-relative entropies 
to statistical distances with operational interpretation, a direct 
operational interpretation of the Renyi relative entropies was 
missing for a long time. This gap was filled in the classical 
case by Csiszar [8], who defined the operational notion of 
cutoff rates and showed that the a-relative entropies arise 
as cutoff rates in state discrimination problems. In Section 
III we follow Csiszar's approach to show that the a-relative 
entropies can be given the same operational interpretation in 
the quantum case, at least for the parameter range a G (0, 1). 

Given a state shared by several parties, and a statistical 
distance D, the D-distance of the state from the set of uncorre- 
cted states yields a measure of correlations among the parties. 
For instance, a popular measure of quantum correlations is 
the relative entropy of entanglement [9], which is the relative 
entropy distance of a multipartite quantum state from the set 
of separable (i.e., only classically correlated) states. Similarly, 
a measure of the total amount of correlations between parties 
A and B sharing a bipartite quantum state pab, can be defined 
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by the Z3-distance of pab from the set of product states, 

Id{A: B\pab) ■■= inf D [pabWcta® cfb) , 

<JAeS(UA).'yBeS(HB) 

where S{'Ha) and S{TLb) denote the state spaces of parties 
A and B, respectively. When the statistical distance is the 
relative entropy S, there is a unique product state closest to 
PAB, which is the product pa® Pb of the marginals of pab, 
and we have the identities 

Is{A : B\pab) = S [pAB \ \PA® Pb) 



inf S{pab\\(^a® Pb) 

aAeS{HA) 

inf S {pAB \\ PA ctb) ■ 
<jBeS(HB) 



(1) 



These identities, however, are not valid any longer if S is 
replaced with some other statistical distance D, and one may 
wonder which formula gives the "right" measure of corre- 
lations, i.e., which one admits an operational interpretation. 
When D is an a-relative entropy or a Hoeffding distance, an 
operational interpretation can be obtained for D{pab \ \ PA'S) 
Pb) in the setting of discriminating pab from pA ® Pb, as 
described above. It seems, however, that when D is an a- 
relative entropy and the aim is to measure correlations between 
the input and the output of a stochastic communication channel 
then it is the last formula in (1) (with 5* replaced with an a- 
relative entropy) that yields a natural operational interpretation, 
as we will see below. 

By a classical-quantum communication channel (or simply 
a channel) we mean a map W : X ^ S{'H), where X is a 
set and H is a Hilbert space, which we assume to be finite- 
dimensional. Note that there is no restriction on the cardinality 
of X, and this formulation encompasses both the case of clas- 
sical channels (i.e., when the range of W is commutative) and 
the standard formalism for quantum channels (i.e., when X is 
the state space of an input Hilbert space and is a completely 
positive trace-preserving map). A "lifting" of the channel can 
be defined hy W : X ^ S{nx (E>U), W : x ^ (gi W^, 
where Hx is some auxiliary Hilbert space with dimension 
equal to the cardinality of X, and dx := |e2;)(e2;| for some 
orthonormal system {ex}x&x in Hx- The expectation value of 
W with respect to a finitely supported probability measure p G 
A4f{X) is a classical-quantum state EpVl^ = '^^p{x)5x®Wx 
on the joint system of the input and the output of the channel, 
and its marginals are given by Tr-HEpVF = p := J2xPi^)^x 



and Tr, 



EpW 



J2xPi^)^x- The amount of 
correlations between the input and the output in the state EpW^, 
as measured by the relative entropy, can be written in various 
equivalent ways: 

Isip; W) 

:= S (EpW 1 1 p (g) EpW) = ini^ S (EpW 1 1 p ® ct) (2) 



y^p{x)S{Wx\\EpW)^ inf y2p{x)S{Wx\\<j) 



= S{EpW)-J2pi^)S{Wx). 



(3) 
(4) 



The Holevo-Schumacher- Westmoreland theorem [10], [11] 
shows that the asymptotic information transmission capacity 
of a channel, under the assumption of product encoding, is 
given by the Holevo capacity 



X*s{W):= sup Is{p;W), 
peMf(x) 



(5) 



which is the maximal amount of correlation that can be created 
between the classical input and the quantum output in a 
classical-quantum state of the form EpTy, p € M.f{W). A 
geometric interpretation of the Holevo capacity was given 
in [12], where it was shown that the Holevo capacity of a 
channel W is equal to the relative entropy radius Rs{i'a.nW) 
of its range, where the Z?-radius of a subset S C S{H) for a 
statistical distance D is defined as 



Rd(^):^ inf sup ^(p 1 1 cr). 

aes(n) pes 



(6) 



Not so suprisingly, the identities in (2)-(4) do not hold for 
a general statistical distance D, and one may define various 
formal generalizations of the Holevo capacity. Here we will 
be interested in the quantities 

xhoiW):^ sup DiEpW\\pg>EpW), (7) 
peMf(x) 

X*jy^{W):= sup inf D{EpW \\p (g) a), (8) 

peMf{X)<^'^S{-H) 

X*D.2{W):= sup inf Vp(a^)i^(W^.|k), (9) 

peMf{X)'^eSiH)^^ 



RD{ra.nW):= inf supD(Wx\\a). 
tTes(H)xex 



(10) 



The capacities X*d ii^)^X*D 2i^) i?D(ranW^) were 
shown to be equal in [8] when the channel is classical and D is 
an a-relative entropy Sa with arbitrary non-negative parameter 
a, and in [13], the identity Xs„ i(^) = RSa{"ca,nW) was 
shown for quantum channels and a e (l,+oo). In Section 
IV we follow the approach of [8] to show that Xd li^) ~ 
X*D 2(W^) = Boi'^s.nW) for classical-quantum channels when 
D is an a-relative entropy with parameter a £ (0, 2]. 

The Holevo-Schumacher- Westmoreland theorem identifies 
the Holevo capacity (5) as the optimal rate of information 
transmission through the channel in an asymptotic scenario, 
under the assumption that the noise described by the channel 
occurs independently at consecutive uses of the channel (mem- 
oryless channel). However, in practical applications one can 
use a channel only finitely many times, and the memoryless 
condition might not always be realistic, either. Hence, it is 
desirable to have bounds on the information transmission 
capacity of a channel for finitely many uses. For a given 
threshold £ > 0, the one-shot e-capacity of the channel is 
the maximal number of bits that can be transmitted by one 
single use of the channel, with an average error not exceeding 
£. Note that finitely many (possibly correlated) uses of a 
channel can be described as the action of one single channel 
acting on sequences of inputs, and hence the study of one-shot 
capacities addresses the generalization of coding theorems in 
the direction of finitely many uses and possibly correlated 
channels at the same time. In [14] a lower bound on the 
one-shot e-capacity of an arbitrary classical-quantum channel 
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W was given in terms of the Renyi capacities x*s o(^) 
with parameter a G [0,1). This bound was shown to be 
asymptotically optimal in the sense of yielding the Holevo 
capacity as a lower bound in the asymptotic limit, but no upper 
bound of similar form has been known up till now. In Section 
V we show an upper bound on the one-shot e-capacity in terms 
of the Renyi capacities x*s i(W^) with parameter a > 1 that is 
again asymptotically optimal in the above sense. It remains an 
open question whether the capacities x*s o(^) ^'^'^ X*s i i^) 
are equal for a given a. To the best of our knowledge, the 
answer to this question is unknown even in the classical case. 

II. Preliminaries on the Renyi relative entropies 
Let H be a finite-dimensional Hilbert space with d := 
dimH. We will use the notations B{TL)+ and B{T-L)++ to 
denote the positive semidefinite and the strictly positive def- 
inite operators on H, respectively. Similarly, we denote the 
set of density operators (positive semidefinite operators with 
unit trace) by S{'H), and use the notation S{TL)++ for the set 
of invertible density operators. We will use the conventions 
0" := 0, a G M, and logO := — oo, log+oo := +oo. 
By the former, powers of a positive semidefinite operator are 
only taken on its support, i.e., if the spectral decomposition 
of an A G B{TL)+ is A = J^k'^^Pk, where all aj, > 0, 
then A°' := J2k '^k^k for all a G M. In particular. A" is the 
projection onto the support of A. 

Following [15], we define for every a E [0,+c>o)\{l} the 
a-quasi-relative entropy of an A G B{Ti.)+ with respect to a 
B G B{'H)+ as 

g„ {A\\B) 

'sigii(a - l)Tr suppA<suppB 

or a G [0, 1), 
-oo, otherwise. 

The Renyi a-relative entropy of A with respect to B is then 
defined as 

{A \\B):= ^— log sign(« -1)Q^{A\\B). 
a — I 

Note that SaiA\\B) = +oo if supp A _L supp B, or if 
supp A ^ supp B and a > 1. In all other cases, S'^ 1 1 B) is 
a finite number, given by S'„ 1 1 = logTr A°'B^-°'. 
Note that for a G (0, 1), we have 

Si.a{A\\B) = ^-^S^{B\\A). (11) 
a 

It is easy to see that if Tr A = 1 then 

Si {A\\B) := \im Sa,{A\\B) ^S{A\\B) 

where S" (A || B) is the relative entropy 

Tr A(log A - log B) , supp A < supp B, 



S{A\\B) := 



otherwise. 



Operator monotonicity of the function x i—?' a; > 0, 

for a e [0,1] yields that 

Qa{A\\B + C) <Qc,{A\\B) and 
Sc,iA\\B + C) < 5a (A II B) 



for any A,B,C G B{7i)+ and a G [0, 1], and the same holds 
for a > 1 if B and C commute. In particular, for fixed A, B £ 
B{H)+, the maps < s ^-^ Qa {A\ \ B + el) and < e ^ 
Sa {A\ \ B + el) are monotonic decreasing, and it is easy to 
see that, for any a G [0, +C!o), 

Qc.iA\\B)=svLpQ^{A\\B + eI), (12) 
Sa{A\\B)=snpSa{A\\B + eI). (13) 

For a G [0, 2] \ {1}, the a-quasi-relative entropies have the 
monotonicity property [15]-[17] 

Qa,{<f{A)\\<f{B)) <Q^{A\\B), A,BeB{H)+, (14) 

where $ is any completely positive trace-preserving (CPTP) 
map on B('H). As a consequence, the a-quasi-relative en- 
tropies are jointly convex in their arguments for a G [0, 2] \ 
{1}: 



Qa 



. PiAi 



PrB,] <}_^^p,Qc.{A\\B,), (15) 



where Ai,Bi G B{H)+, and {pi} is a finite probability 
distribution [15], [18], [19]. 

The monotonicity property (14) of the a-quasi-relative 
entropies yields that, for any CPTP map $ on B{'H) and 
a G [0, 2], 

S^mA)\mB))<S^{A\\B), A,BeB{n)+. 

Convexity of the function log for a G [0,1) yields, by 
(15), that for a G [0, 1], 



for any finite probability distribution {pi} and Ai,Bi G 
B{'H)+. Note that the joint convexity (15) of the a-quasi- 
relative entropies for a G (1,2] is not inherited by the cor- 
responding Renyi relative entropies, as -^^^ log is not convex 
for a > 1; for a counterexample, see e.g. [20]. Actually, the 
example of [20] shows that the Renyi relative entropies are 
not even convex in their first argument for a > 1. However, 
we have the following: 

Theorem II.l. For a fixed A G B{'H)+, the map B M> 
Sq, (A II B) is convex on B{T-L)+ for every a G [0, 2]. 

Proof: For a G [0,1], the assertion is a weaker version 
of (16), and hence for the rest we assume that a G (1,2]. Let 
A, Bi,B2 G B{'H)+; it suffices to show that 

So. {A\\r^{Bi + £/) + (1 - 77)(B2 + el)) 

< r]Sc. {A 1 1 Bi + el) + (1 - 7j)Sa {A \ \ B2 + el) (17) 

holds for every 77 G (0,1). Taking the limit e \ will 
then give the desired convexity inequality. Note that (17) is 
equivalent to 

logc^(r/(Bi + el) + (1 - r]){B2 + el)^-") 
< ii\ogLj{{Bi + el)'-") + {l-v) logc<.((B2 + £/)'-"), 

where uj{X) TtA°'X, X G B{'H), is a positive linear 
functional on B{Ti.). Proposition 1.1 in [21] states that the 
functional X ^ loga;(/(X)), X G B{H) ++, IS convex 
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whenever cj is a positive linear functional and / is a non- 
negative operator monotone decreasing function on (0, +00). 



Applying this to the uj above and f{x) :- 



\x > 0, the 



assertion follows. ■ 
By computing its second derivative, it is easy to see that the 
function a ^ logTryl"_B^^", a G M, is convex on M for any 
fixed A,B£ B{'H)+, which yields by a simple computation 
the following: 

Lemma II.2. IfTi A < 1 then the function Sa{A\\B) is 
monotonically increasing on [0, 1) and on (1, +cxd). Moreover, 
ifTrA — l then a 1-^ Sa {A\ \ B) is monotonically increasing 
on [0, +00). 

Proposition II.3. Assume that Tr A < 1 and Tr B < 1. For 

a G (0, 1), Sa {A\ \ B) > with equality if and only if A = B 
and "Ti A = 1. If A is a density operator and Tr _B < 1 then, 
for all a G [1, +00), Sc{A\\B)> 0, and Sa{A\\B) =Q if 
and only if A ~ B. Moreover, if both A and B are density 
operators then the Csiszdr-Pinsker inequality 

S^{A\\B)>\\\A-B\\l 

holds for all a > 1. 

Proof: Assume first that a G [0, 1). Then, by Holder's 
inequality, 

TrA"Bi-" < (TrA)" (TrS)^"" < 1, 

from which Sa{A\\B) = -^logTv A" B^-" > 0. Obvi- 
ously, II B) = if and only if Tr A^B^-" = 1. By 
the above, this is true if and only if Tr^ = TrB = 1, and 
Holder's inequality holds with equality. The latter condition 
yields that B ^ \A for some A > 0, and Tr A = Tr B yields 
A = 1. Lemma II. 2 yields the assertion on strict positivity 
for a > 1 when A is a density operator. The Csiszar-Pinsker 
inequality holds for a = 1 (cf. Theorem 3. 1 in [22]) and hence, 
by Lemma II. 2, for all a > 1. ■ 
For a density operator p G S{TL), its Renyi a-entropy for 
a G [0, +00) is 

Sc^ip) :=logd-5„(p||(l/d)/). 

For a ^ 1 we have Sa{p) = j^logTrp", which is easily 
seen to be non-negative, and Sa ip\ \ {^/d)I) > yields that 



< S'a(p) < logd, aG[0,+oo). 



(18) 



The Hoeffding distance of states p,(J G S{H) with param- 
eter r > is defined as 

HripWa):^ sup <^ \-Sa{p\\(j) 

o<Q<i L i — a 

-ar-ipia) ~ 

— sup = supj— sr — ■^}\s)]^ 

0<a<l 1 — a s>0 

(19) 

V'(a) := logTrp^o-i-", a G R, 
^^(s) := (l + s)^/'(s/(l + s)), s>-l. (20) 



where 



Convexity of yields the convexity of V', and a simple com- 
putation shows that V'(O) + ^"'(0) = -(/"'(O) < lim^^oo i^' [s) = 
■(/-(l) < 0. Hence, 



Br {P\\CT) = 




-V^O), -r<VXO) + ^'(0), 



The function r 1— > B^ (p 1 1 cr) is the Legendre-Fenchel 
transform (up to the sign of the variable) of -0 on [0, +00) 
and hence it is convex on [0, +00). Using the bipolar theorem 
for convex functions [23, Proposition 4.1], we get 



Sa (pII 0-) 



sup 

r>0 



1 



Hr{p\\cy) 



< a < 1. 



That is, the Renyi relative entropies with parameter in [0, 1) 
and the Hoeffding distances with parameter r > mutually 
determine each other. Note that r B^ (p 1 1 cr) is monotonic 
decreasing, and 

5*0 (/O 1 1 ct) = lim Br{p\\(y) < ^^o (p 1 1 cr) = S'l (p 1 1 cr) ■ 

Finally, the max-relative entropy of A, _B G S{'H)+ was 
defined in [24] as S'max {A\\B) inf{7 : A < 2'^B}. One 
can easily see that if A and B commute then ^max 1 1 5) = 
Soo {A\\B) := limQ._^.oo S'a (A || B), but for non-commuting 
A and B, 5max(A||B) < Soo{A\\B) might happen [14]. 
In general, ^2 {A \ \ B) < 5,„ax {A \ \B) < Soo{A\ \ B) [25], 
[26]. 



III. Cutoff rates for quantum state discrimination 

Consider the asymptotic binary state discrimination prob- 
lem with null hypothesis p and alternative hypothesis cr, as 
described in the Introduction. We will consider the scenario 
where the error probability of the second kind is minimized 
under an exponential constraint on the error probability of the 
first kind; the quantity of interest in this case is 

Pn^r niin{/3„(r) | T G S(7^®"), < T < /, 
and a„(T) < 2-""^}, 

where r is some fixed positive number In general, there is no 
closed formula to express /3„ or the optimal measurement in 
terms of p and a for a finite n, but it becomes possible in 
the limit of large n. We define the Hoeffding exponents for a 
parameter r > as 



K{p\\ 



hr{p\\ cr) 



hr {p\\(t) 



inf {liminf(l/72)log/3„(r„) 



limsup(l/n) loga^(T^) < -r}, 
limsup(l/n) log^„(T„) | 

n—¥oo 

limsup(l/n)logQ!„(r„) < -r}, 



iuf { lim (l/?i)log/3„(T„)| 

limsup(l/n) log an (T^) < -r}. 
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It is easy to see that 



1 



hripW'^) < lim inf - log /3„,r 

< lim sup — log l3n r < hr {p\\(j) ■ 
n— )-oo ^ 

Moreover, as it was shown in [3]-[6], we have 

hrip\\o-) ^ KipWcr) ^ hrip\\<7) = -Hr (p 1 1 cr) , (21) 

where Hr{p\\a) is the Hoeffding distance defined in (19), 
and hence, the limit lim„^oo ^ log /3ri,r exists and 

lim - log I3n.r = -Hr (p 1 1 Cr) ■ 

Note that while the above result gives the exact value of the 
optimal exponential decay rate for every r, the evaluation of 
Hr {p\\<t) is a non-trivial task even for one single r. Indeed, 
there is no closed formula known for the Hoeffding distance 
in general, and, as the definition (19) shows, in order to 
compute Hr (p 1 1 u), one has to know in principle all the Renyi 
relative entropies Sa{p\\o-) for every a S (0,1), and solve 
an optimization problem. It is thus natural to look for simple 
approximants of the function r Hr (p 1 1 c) for given p and 
a. Following [8], for a k < we define the generalized k- 
cutojf rate (p 1 1 a) as the supremum of all > that 
satisfy 

hr{p\\<y) < n{rQ~r), r > 0. (22) 

That is, we are looking for a linear approximation of r ^ 
Hr{p\ \ a) which is optimal among all the linear functions 
with a given slope. Note that (22) gives a restriction only 
for r < To, as otherwise the right-hand side is non-negative 
and the inequality holds trivially. That is, one can ensure an 
exponential decay rate at least as fast as given in the right-hand 
side of (22) whenever r < tq C„ (p || a). Moreover, as the 
following Theorem shows, the cutoff rate is easy to evaluate, 
as it is equal to a Renyi relative entropy with a given parameter 
depending on k. 

Theorem III.l. For every k < 0, 

C. (p|k) - (pII a) = 5 1 (a II p) . (23) 

|k| 1+1" 

Proof: If supp p _L supp cr then all the quantities in (23) 
are +oo and the assertion holds trivially. Hence, for the rest 
we assume that supp p is not orthogonal to supp a. Note that 
the second identity follows from (11). Let k < be fixed. By 
(21), our goal is to determine the largest ro such that 

— |K|r + l^lro < —hr (p II cr) = Hr (p || cr) , r > 0. 

By (19), Hr (pII cr) > — |k|?' — V'd'^l) for every r > 0, where 
ijj is given in (20). On the other hand, for := — i/;'(|k|) 
we have ip{s) > tp{\K\) + (s — |k|)-0'(|k|), s > 0, due to the 
convexity of ip and hence, 

Hr^ (p II a) = sup{sV''(|At|) - ^{s)} = |«|Vi'(l^l) - ^(I«I) 

s>0 

-|K|r„ - i/'(|k|). 



Therefore, 

c^plk) 



1 Jn |^ 1+k^ , 

|k| |k| VI 

i—S'm (p||cr) . 



The following Corollary is immediate from Theorem III.l, 
and gives an operational interpretation of the Renyi relative 
entropies with parameter between and 1: 

Corollary III.2. For every p,<T E S{T-L) and every a G (0, 1), 



Sa (P||cr) 



1 - a 



(p||a)=C^ {a\\p). 



In the above, we considered the scenario where the con- 
secutive trials are independent and identically distributed, and 
hence the state describing the outcome probabilities of n 
trials is a state of the form p®" or cr*^". In a more general 
scenario, that encompasses correlated trials, one can consider a 
sequence of Hilbert spaces H :— {TLn}neK and two sequences 
of states p := {p„}„gN and a :— {cr„}„gN- The goal is 
again to analyze the asymptotic performance of a decision 
scheme for deciding between p„ and an for each n G N. 
The error probabilities q;„ and /?„ can be defined in the 
same way as above, and in analogy with the above problem, 
the limit lim„_).oo(l/c(n)) log/3„,r can be considered, where 
c : N N is some monotonically increasing function such 
that lim„_j.oo c{n) — +oo. The following was shown in [6]: 

Theorem III.3. Assume that the limit V'(q^) '■— 
lim„^oo - 1)'5'q (Pn || cr„) exists for all a e [0, 1) and 

the convergence is uniform on [0, 1). Assume, moreover, that 
ip is differentiable on (0, 1). Then, 



lim 

n— !-oo c(n ) 



logAi 



lim -i- 

)i->-oo cyn) 



Hc{n)r (Pri || Cr„) 



Moreover, Hr (p 1 1 <? 

V'(«) 

a — 1 



=:-Hr{p\\a). 

suPo<a<i (r^ + "^here 

^ Sa{p\\a) ■■= lim -t-t-S'q (p„ II cr„) . 
ji-s-oo c(n) 



A particular example that satisfies the conditions of Theo- 
rem III. 3 is the case where p„ and cr„ are the ri-step restrictions 
of classical ergodic Markov chains with finite state-space [6]. 
Physically motivated examples can be obtained by considering 
Pn and cr„ to be finite-block restrictions of temperature states 
of non-interacting fermionic and bosonic systems on cubic 
lattices [27], [28]. 

The cutoff rates Ck (p | | (?) can again be defined in the same 
way as in (22) (with the scale 1/n replaced with l/c{n) in the 
definition of hr (p|| a)). The same argument as in the proof 
of Theorem III.l leads to the following: 

Theorem III.4. Under the assumptions of Theorem III. 3, we 
have 

C« (pW^) ■rT'5'_i^ {p\\a) = Si (ct||p) 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY 



6 



for every k < 0, or equivalently, for every a £ (0,1), 



Sa (p||ct) 



1 - a 



= (all 



IV. Equivalence of capacities 

Let W : X ^ S{T-L) be a classical-quantum channel as 
in the Introduction. Our aim in this section is to show that 
the capacities defined in (8)-(10) are equal to each other 
when £) = S'q is a Renyi relative entropy with parameter 
a S (0,2]. We will assume that ranVK is compact in S{T-L). 
This assumption is satisfied when is a CPTP map on the 
state space of an input Hilbert space as well as when <Y is a 
finite set. 

Note that S{Ti) is a compact convex subset of the Euclidean 
space B{'H)sa (with the Hilbert-Schmidt norm). Let /C be a 
compact subset of S{T-L) and Ai{IC) be the set of all Borel 
probability measures on IC. Let Ck(/C) be the real Banach 
space of all real continuous functions on /C with the sup-norm; 
then A4{IC) is identified with a w*-compact convex subset of 
the dual Banach space Cr(/C)*. We also introduce the subset 
Mf{IC) of A4{IC), consisting of finitely supported measures. 

For every a G (0, 2] \ {1} and £ > 0, define the functions 
fa^e and ga^e on M{IC) x S{n) by 

/^.^(p,^) / Sa{p\\a + el)dp{p), 
Jk 



tip, 0-) 



sl)dp{p). 



Note that for every fixed a, the functions Sa{-\\cr + si) and 
Qa (■ II a + el) are continuous for e > and, by (12) and 
(13), are lower semicontinuous for £ = 0. Hence, the integrals 
defining ^ and g^,^ exist for all £ > 0. Furthermore, by 
(12), (13), and Beppo Levi's theorem, 

fa.oijP, cr) = lim fa.eiP, ^) = SUp fa.eijp, Cr), _p £ X(/C), 

(24) 

and the same holds if we replace fafi with g^.o and fa,e with 

9a,e- 

Lemma IV.l. For every a G S{T-L) and £ > 0, fa,e{-,o') cind 
(7ct^e(-,cr) are ajfine and continuous on A4{1C). 

Proof The claims about the affinity are obvious, and the 
continuity of the functions Sa{- \ \cr + £l) and Qa (• 1 1 ct + el) 
yields, by definition, that fa,e{'i'^) and 5Q,e(-,cr) are contin- 
uous in the w* -topology. ■ 

Lemma IV.2. For every p e M{JC) and e > 0, fa,e{p, ) and 
9a.e{p, ■) are convex and continuous on SlTi). 

Proof: Convexity follows from Theorem II. 1 and (15). 
Let {cTfclfegN be a sequence in S{T-L), converging to some 
(To e S{n). Let fk{p) := Trp"(afe + £/)i-" and /(p) 
Trp"(cro + £/)^"", p&lC. Since 

I Trp«((7fc + £/)!-" - Trp"(ao + el)^~''\ 
< Trp" • \\{<Jk + £/)i-" - (do + £/)^-"||oo, 

and Tr p"' < d for every a > 0, we see that limfe fk [p) ~ f{p) 
uniformly in p. This yields the continuity of ga.s{p, )■ 



For a e (1,2], f{p) > Trp"(l+£)i-" > {l + e)^-°'d^-°', 
due to (18). For a 6 (0,1), the operator monotonicity 
of the function x M- x^^", x > 0, yields that f{p) > 
Trp" (£/)!-" > £i-" for all p e IC. Since 



\fk{p)-f{p)\=f{p) 



fkip) 



fip) 



> inf /(p) 
peic 



fkip) 



fip) 



we see that fkip)/ fip) converges to 1 uniformly in p as /c 
oo, and hence 



^^(pllcrfc + el) - Saip\\ao + el) = 



1 



a — 1 



■lof 



fkip) 
fip) 



converges to uniformly in p, due to which 

limfe_>oo /a,e(p,crfe) = fa.eiP, CTq) ■ ■ 

To simplify notation, we fix an a £ (0, 2] \ {1} for the rest. 
We have the following: 

Proposition IV.3. For every £ > 0, there exists a £ '^('H) 
such that 

max fa,eiP,crs) 

= min max faeip,'^)— max min /^^(p, cr) 

aeSin) peM{K) ' peM(K)aeS(H) ' 

(25) 

= min max Sa ip\\ cr + el) — max Saip\\<ye + el) . 

(26) 

Moreover, the same relations hold if the maxima over A4 (/C) 
are replaced with maxima over A4 / (/C). 

Proof: For a fixed a, /Q,e(-,cr) is continuous 
and, consequently, p ming.g5(-H) /q,£(p, ct) is upper 

semicontinuous and therefore they reach their suprema 
on the compact set A^(/C). Moreover, fa.eiP,'^) ^ 
suPpesupppS'a(p||cr + £/),p G MilC), (J 6 SiH), 
yields that the maximum of /q.j(-,(t) on A^(/C) is reached 
at a Dirac probability measure and hence. 



max fa sip, o') = max Sa ip\\ cr + el) 

pGM(K) ' peK 

= max fa,eip,cr) 

pGMfilC) 



(27) 



for every cr e SiH). Continuity of fa,eip,') yields that 
(T maxpg^(^) fa.eip, cr) is lower semicontinuous on SiH) 
and hence it reaches its infimum at some point cr^, which yields 
miiVeSCH) niaXpg^(^) fa.eip, cr) = maXpg^(;c) /a,e(P, cr^). 
The identity of the two expressions in (25) follows by Sion's 
minimax theorem [29], [30], due to Lemmas IV.l and IV.2. 
The formulas in (26) follow from (27). The last assertion 
follows from (27) and the fact that fa.e\Mf{K)xS{'H) also 
satisfies the conditions in Sion's minimax theorem. ■ 
For the rest, for every £ > we fix a ct^ as given in 
Proposition IV.3. Note that the compactness of SiH) yields 
that there exists a sequence {£fc}fcGN and a cto £ SiH) such 
that limfc £& = and limfe (t^^ = ctq. 
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Proposition IV.4. Let ctq be a limit point as above. Then, 



Thus, 



sup fafi{p,(To) 

= min sup /q,o(p, cr)= sup min /a,o(p, cr) 

(28) 

= min sup Sa{p\\cF) = sup Sa{,p\ \ cq) ■ (29) 
c^esCH) p^jc peK 

Moreover, the same relations hold if the suprema over M. (JC) 
are replaced with suprema over Ai / (JC). 

Proof: By (24), fa,oip,') is lower semicontinu- 
ous on S{'H) and hence so is the function a t-^ 
suPpeA4(K) /a,o(p,cr), cr G Therefore, they reach 

their infima on S{'H). For every fc G N, 



max fafiip, <Te, + Ekl) = max (p, ) 

= max min /^.^^(p, cr) 



inf /q,o(p, cr)> min /q.o(p, cr) 



< sup niin/a.o(p, cr), 



(30) 



where the first identity is by definition, the second is due 
to Proposition IV.3, and the inequaUty follows from (24). 
Furthermore, 



> min /q.o(p, (1 - e)CT + e(l/d)/) 

0-G5CH) 

+ log(l - e) 

> inf /q.o(p, 0-) + log(l - s), 



and by taking the supremum in 



get 



min^gsCH) /q,o(p, The 



assertion about the other two minima can be obtained by 
repeating the same argument after taking the supremum over 
p G /C in (31) and the supremum over p G Ai{IC) in (32), 
respectively. ■ 

Remark IV.6. The first supremum in (28) and the last one in 
(29) can be replaced with maxima. 

Proof: By Proposition IV.4, 

sup Sa{p\\<Jo) = mill sup Sa{p\ \ (j) 

< sup 5„ (p II {l/d)I) = sup {logd - Sa{p)} 

p£JC p£K 

< log d. 

Thus, (p 1 1 (To ) is finite, and therefore it is given as 
Scip\\(Ta) = ^logTrp"cr^"" for every p G /C. This 
yields that p Sa{p\\(Jo) on /C and p i-> fa.o{p,<^o) on 
M{IC) are continuous, and hence they reach their suprema. 



sup min fa,o{p,cr) < min sup fa.o{p,<j) 

p£M{K)'^'^S(H) <^'^S(H) p^M{K) 

< sup /q,o(p, o-q) 

p<£M{K) 

<liminf sup /q_o(p, o-et + £&/) 

fc-s-oo p(zM{K) 

< sup min fa.o{p,a), 
peM{K) o-eS(w) 

where the first two inequalities are obvious, the third 
one follows from the lower semicontinuity of cr H> 
^^Pp£M{K) /q,o(p, cr), cr G B{'H)+, and the last inequality is 
due to (30). This gives the identities in (28), and the identities 
in (29) follow the same way as in Proposition IV.3. The last 
assertion follows by repeating the argument above with the 
suprema and maxima over (/C) replaced with suprema over 

Mfiic). m 

Remark IV.5. Note that the minima over 5(7/) in (28) and 
(29) can be replaced with infima over 

Proo/.- The trivial inequality (1— £)fT+e(l/fi)/ > (1— e)a 
yields 

^„(p||(l-£)a + e(l/d)/) + log(l-£) < S^{p\\a) (31) 

for every e G (0, 1), p G K. and a G B{Ti.), and hence, for 
every p G M{JC), 

fc^^oip, (1 - e)a + s{l/d)I) + log(l - e) < /„,ob, (32) 



Since in the proofs of Propositions IV.3 and IV.4 we only 
used the properties of fa,e established in Lemmas IV. 1 and 
IV.2, which are common with the properties of ga.e, we have 
the following: 

Proposition IV.7. The assertions of Propositions IV.3 and IV.4 

hold true if we replace ^ with ga.e for all £ > 0, and Sa 
with Qa. 

Now we are ready to prove the following: 

Theorem IV.8. Let W : X S{'H) be a classical-quantum 
channel with compact image. Then, the capacities defined in 
(8)-(10) are equal to each other when D = Sa is a Renyi 
relative entropy with parameter a G (0, 2]. 

Proof: The assertion is obvious for a = \ from the identi- 
ties (2) and (3), so for the rest we assume that a G (0, 2] \ {!}. 
Let K. := ranM^. Proposition IV.4 yields that 

Xs„,2(W^) - sup inf X^^(^)^" " '"^ 
= sup inf ^ p{p)Sa (p 1 1 cr) 
= sup min /q,o(p, o") 

p^Mf{K)'"^S{n) 

= min sup 5'q, (p 1 1 cr) 

= i?s„ (ranl^). 

Let id be the identical channel on /C = ranVF, and let 
id : p n> (5p (K) p be its lifting as in the Introduction. Using 
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Proposition IV.7, we have 

= sup inf Sa(^pW\\p®(j) 



= sup inf Sa ( Epid 1 1 p (81 cr 

= sup inf — 5— -logsign(Q!- l)g„.o(p, (t) 
peMj{K) c&s{-H) a - i 

= — ^— - logsign(a - 1) sup min gafi{p,<T) 

= — ^ logsign(Q; - 1) min sup (p 1 1 cr) 

a - 1 aesCH) peK. 

= min sup log sign(a — l)Qa (p 1 1 f ) 

aes{n) peic a - 1 

= Rs^iranW). 



While this bound might be rather loose for one single use of 
the channel, it is asymptotically optimal in the sense that it 
yields the Holevo capacity as a lower bound on the optimal 
asymptotic transmission rate of the channel [14]. 

In order to give an upper bound on the capacity, one has to 
find an upper bound on the success probability for any code 
{M,Lp,E) in terms of M. Such a bound was given in [31], 
that we briefly outline below. Note that the function a; n- 
is operator monotonic increasing for a e [l,+oo) and thus 

W^W = iW^ik))^ < (EftiW^^V.))^- Hence, the average 
success probability is upper bounded as 

M / M \i 

k=l \m=l / 



1 / 



vim) 



V. The one-shot classical capacity of quantum 

CHANNELS 

Let W : X ^ '^('H) be a classical-quantum channel. In 
order to transmit (classical) information through the channel, 
the sender has to encode the messages into signals at the input 
of the channel, and the receiver has to make a measurement 
at the outcome to determine which message was sent. A 
code is a triple {M,ip,E), where {1,...,M} labels the 
possible messages to transmit, (p : {1,...,A'/} — > A" is the 
encoding map, and the positive operator valued measurement 
E : {!,..., M} B{n) + , Efii Ei = /, is the decoding. 
The average probability of an erroneous decoding is given by 



1 



M 



Pe{M,ip,E) _^(l-TrW^^(,)i?,) ^ I ~ P^M, ip, E), 
1=1 

where Ps{M,ip, E) is the success probability. The one-shot 
e-capacity of the channel is defined as the logarithm of the 
maximal number of messages that can be transmitted through 
the channel with error not exceeding e: 

C,{W) := max{ log M \ 3(M, ip, E) such that 
Pe{M,p,E) < e}. 

Let x*H o(W^) ™d X*s oO^) denote the generalizations of 
the Holevo capacity of W as defined in (7), for a Hoeffding 
distance with parameter r and for a Renyi relative entropy 
with parameter a, respectively. For any e > and any c > 0, 
the one-shot e-capacity can be lower bounded as 

C.iW) >xk.<™^oW- log + 



: sup 

0<Q<1 



-a log 



■l+c\ 



1 — a 



log 



/2 + c+l/c 



where the inequality was shown in [14], and the identity is 
obvious from the definition (19) of the Hoeffding distances. 



\rn=l / 

<M^ sup 2'^x»(rt^ 
peMfiX) 



(33) 



where 



Xa{p) 



logTrc.(p), Luip) ( ^ p{x)W^ J 

\xex ) 



As it was pointed out in [13], [32], for any a G Sili) and 
p e Mf{X) we have 



5„ V&pW\\p®cj 



Sa EpW 



Xa{p) + Sa 



P 



Tr uj [p) 
uj{p) 



Sa 



Tr u! {p) 



Tr 



and hence 



Xa(p)= inf Sa(^pW\\p®a) 



which in turn yields 

sup Xab) = Xs„.l(W^)- 

p(iMs(X) 

The above observations lead to the following; 
Theorem V.l. For any e > 0, we have 

a 



(34) 



(35) 



(36) 



{W)< inf X5„.iW + 

Q>1 ' 



a - 1 



log 



1 



1 



Proof: Assume that for a code (Af , (p, E) we have 
PeiM, tp, E) < e. Then, by the above. 



logil-e)<\ogPsiM,p,E) < 



a - 1 



[xkAW)~\ogM) 



for every a > 1, from which the assertion follows immedi- 
ately. ■ 
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For each n e N, consider the nth i.i.d. extension of W, 
defined as : A"" ^ 

...,Xn):= Wixi) ® . . . ® W{xn). 

The rate R{C) of a sequence of codes C = {C(") = 
(M(»),<^("),i;("))}„gN is i?(C) := liminf™ilogM("), 
and the asymptotic e-capacity of W (with product encoding) 
is defined as 

C,{W) := sup {i?(C) I limsupPe(C("^) < e}, 

n^oo 

where the supremum is taken over sequences of codes satis- 
fying the indicated criterion. One can easily see that 

liminf-C,(W^(")) < C,{W) < C,,{W) 

n— >-oo fi 

< liminf iCe"(W^^")) 

for any < e < e' < e". The upper bound in Theorem 
V.l is asymptotically sharp in the sense that it yields the 
Holevo capacity as an upper bound on the optimal information 
carrying capacity in the asymptotic limit. The details of the 
proof of the following Theorem are supplied in Appendix B. 

Theorem V.2. Assume that ran W is compact. Then, for any 
e e [0, 1), 

Ce{w) < xUw). 

Proof: By Theorem V.l and Proposition B.2, 

< lim inf I ix* ^ . 1 ) + 1 log 1 

for any < e < e' < 1 and a > 1. By Proposition B.5, 
the assertion follows for every e > 0, and the case e = is 
immediate from Co{W) < Ce(W^), £ > 0. ■ 

Remark V.3. Cutoff rates were also defined in [8] for channel 
coding in the following way: for k < 0, the K-cutoff rate 
Ck(W) is the largest Rq for which 

lim sup - log P,(C(")) < k{Ro - R) 

n— J-oo n 

for any sequence of codes with rate R, while for k > 0, the 
K-cutoff rate Ci^{W) is the largest Rq for which 

lim sup- log P,(C(")) < k{Ro-R) 

n— J-oo n 

for any sequence of codes with rate R. 

Inequality (33) and identity (36), together with the obser- 
vations of Appendix B, yield that, for a > 1, 

limsup-logP,(C(")) < ^(xL im-R) 

for any sequence of codes with rate R and hence, 
C^{W)>xh^,i{W), Q<K<1. 

The above inequality was shown to hold as an equality for 
classical channels in [8]. 



VI. Remarks on the divergence radius 

Let S be a subset of the state space S{'H), and let i?£i(E) 
denote its D-radius as given in (6). A state a* which reaches 
the infimum in (6) is called a D-centre for S. As we have seen 
in the previous section, the Sq, -radii of the range of a channel 
are related to the direct part of channel coding for a e [0, 1) 
and to the converse part for a e (1, +oo]. In both cases, the 
asymptotically relevant quantities are the divergence radii with 
a close to 1. On the other hand, for state discrimination the 
relevant quantity turns out to be the oo-radius. More precisely, 
if pi,...,pr G S{TL) then the optimal success probability 
of discriminating them by POVM measurements is given by 
Ps = {l/r)exp{Rs^^^{pk}) [33], where S'max is the max- 
relative entropy [24]. 

Related to state discrimination is the following geometrical 
problem: given pi,...,pr G find the largest q such 

that there exist states ri, . . . , such that qpi + {1 — q)Ti is 
independent of i. Such a family of states ri , . . . , Ti- is called an 
optimal Helstrom family with parameter q in [34]. As one can 
easily see, the largest such q is given by exp {—Rsmn^iPk}), 
and qpi + (1 - q)^ is an 5max-centre for {pk}l=i- When 
r = 2, the results of Holevo [35] and Helstrom [36] yield that 
the optimal success probability is given by P, = (1 + D)/2, 
where D := (1/2) \\pi - /92II1, and hence, Rs^„A{Pi7 P2}) = 
\og{l + D). Moreover, an Smax-centre is given by a* = {pi + 
2X+)/(l + D) = {p2 + 2X_)/(1 + D), where X+ and X_ 
are the positive and the negative parts of pi— p2, respectively. 
In [38] and [37], a suboptimal Helstrom family was used for 
two states pi and p2 to show Fannes type inequalities. Using 
instead the above optimal Helstrom family in the proof of [37, 
Proposition 1], one obtains the following: 

Proposition VI.l. Let "H be a Hilbert space and J : S{T-L) 
C be a bounded function that satisfies 

|/((1 - e)pi + ep2) - (1 - - ef{p2)\ < h2{e) 

(37) 

for any two states pi, p2 ond any e G [0,1], where h2{x) :— 
—a; log a; — (1 — x) log(l — a:) is the binary entropy function. 
Then, for any two states pi , p2 on %, we have 

|/(pi)-/(p2)| <2/i2(£)+4eM, (38) 

Proof: Let ti , T2 be the above optimal Helstrom family 
and (J* ~ {l~e)pi + £Ti be the Smax-centre of {pi, /92}- Then, 

\f{Pl)-f{P2)\ 

<l./(pi)-/K)l + l/('^*)-/(p2)| 
2 

1=1 

< 2/i2(e) +4£A'/. 

■ 

The von Neumann entropy is known to satisfy (37), which 
in turn yields by a simple computation that the conditional 
entropy and the relative entropy distance from a convex set 
containing a faithful state satisfy (37), too. Note that for the 
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latter two quantities (38) yields a slight improvement of the 
result of [38] and of [37, Lemma 1], respectively, where the 
same bound was obtained with £ = ||pi — P2|li- 

For the case where D is the relative entropy S, it was 
shown in [12] that for any subset S of states, the S'-centre 
is unique and is inside the closed convex hull coS of S. 
This is no longer true for other Renyi relative entropies in 
general. For instance, for the classical probability distributions 
pi := (1/2, 1/4, 1/4), p2 := (1/2,1/6,1/3), an 5oo -centre 
is given by a* = (6/13,3/13,4/13), and one can easily 
verify that no 5oo -centre can be found on the line segment 
connecting pi and p2- It is of some mathematical interest to 
find conditions on D ensuring the existence of a unique D- 
centre of S in coS for any subset of states S. 

VII. Concluding remarks 

The idea of representing the Renyi relative entropies as 
cutoff rates is from Csiszar [8], and we essentially followed 
his approach here. Note, however, that the analysis of the error 
exponents h^,hr, hr in the classical case, on which the proof 
of [8] relies, is based on the Bellinger arc and a representation 
of the Hoeffding distances that have no equivalents in the 
quantum setting [2]. Instead, our analysis is based on an 
equivalent definition of the Hoeffding distances that can be 
defined also for quantum states, given in (19). That this 
definition of the Hoeffding distances have the right operational 
meaning was proven recently under the name of the quantum 
Hoeffding bound [3]-[6]. Note that this representation of the 
Hoeffding distances allows for a somewhat simplified proof 
even in the classical case. Moreover, this proof works also 
for the more general setting of correlated states considered in 
Theorem III. 3. 

The way to prove the identity of the different definitions of 
the Renyi capacities using minimax results is also from [8]. For 
this, the convexity of i— > Qa (p 1 1 cr) and ct i— > (p 1 1 cr) for 
every fixed p are essential. These are obvious in the classical 
case for Qa, and for Sa when a S (0, 1), and were proven 
for Sa and a > 1 in [8]. That proof, however, cannot be 
extended to the quantum case and, as far as we are aware, our 
Theorem II. 1 is a new result. Note that in the quantum case the 
fact that X M> x^^" is not operator convex for a > 2 yields a 
strong limitation, and no convexity properties of the a-relative 
entropies are expected to hold for parameters a > 2. This 
limitation was overcome in [13], where a completely different 
approach was used to prove that x*s^ i = -RSa(ranM^) for all 
a > 1. Another subtle technical difference between the proofs 
for the classical (more precisely, finite X) and the general 
cases comes from the fact that in minimax theorems one of 
the sets has to be compact and convex, which in the first case 
can be chosen to be A4f{X), and the other space has to be 
convex, which is chosen to be S{'H)++. In the general case 
X is usually the state space of a quantum system, which is 
of infinite cardinality and hence A4f{X) is convex but not 
compact, whereas replacing A4f{X) with 7W,„(ranM^) as in 
Appendix B yields a space that is compact but not convex. 
Hence we switched the role of the two spaces and chose S{H) 
to be the compact convex set. However, the (dis)continuity 



properties of the Renyi relative entropies then wouldn't make 
it possible to satisfy the continuity requirements of minimax 
theorems, and that's why we had to use e-perturbations in 
Section IV. 

It is worth noting that Renyi relative entropies and the corre- 
sponding channel capacities are related to different regimes of 
information-theoretic tasks for the parameter values a S (0, 1) 
and for a G (l,+oo). Indeed, the first interval is related to 
the so-called direct part of problems, i.e., where a relevant 
error probability decays exponentially for rates below the 
optimal one, while the second interval is related to the (strong) 
converse regions, where a relevant success probability goes to 
zero (exponentially) for rates above the optimal rate. Cutoff 
rates are also defined in an asymmetric way, separately for 
the direct region (k < 0) and for the strong converse region 
(k > 0); see Remark V.3 and [8] for more details. 

In the case of hypothesis testing between p and cr, for rates 
r < S{(7\\p), the optimal exponential decay rates of the 
error probabilities of the second kind are given explicitly by 
the Hoeffding distances Hr {p \ \ a), which are defined through 
the Renyi relative entropies Sa {p\ \ cr) , a € (0, 1). For rates 
r > S{a\\p), the success probabilities decay exponentially, 
and the optimal decay rates are known in the classical case 
to be given by the Han-Kobayashi bounds [2], [39], [40], 
defined through (p 1 1 cr) , a G (l,+oo). In the quantum 
case, however, the exact error exponents for the converse part 
are not known and hence it is not possible to extend the results 
of [8] on the cutoff rates for k > at the moment, though 
the results of [2], [40] give inequalities between the cutoff 
rates and the Renyi relative entropies that are expected to hold 
as equalities. For channel coding, the exact error exponents 
are not known for every rate value even in the classical 
case, but we see the same picture, i.e., the exponential decay 
of error probabilities for rates below the Shannon capacity 
can be expressed in terms of, or upper bounded by, the 
Renyi capacities x*s with a G (0, 1), while for rates above 
the Shannon capacity, the exponential decay rate of success 
probabilities can be expressed in terms of the Renyi capacities 
X*s^ with a G (l,+oo) [8]. 

Due to finite-size effects, the one-shot capacities are dis- 
continuous functions of the error bar e, and they depend on 
the parameters of the channel in a more intricate way than 
their asymptotic counterparts. As a result, it doesn't seem to 
be likely that they could be expressed in a similarly compact 
form as the asymptotic capacities, and if one is looking for 
some universal statement on them, applicable to all channels 
and all possible error bars, then probably the best one can 
hope for are lower and upper estimates on their values. In 
view of the above noted difference between the role of the 
intervals a G (0, 1) and a G (1, +c»), it seems rather natural 
to expect lower bounds in terms of the capacities Xs„ with 
a E (0, 1) and upper bounds in terms of the capacities x*s 
with a G (l,+oo). While we left the question of optimality 
open for the bounds provided in Section V (in fact, even to 
formulate what optimality might mean in this setting is a non- 
trivial question), it is somewhat reassuring that the optimal 
asymptotic capacity can be recovered by applying our bounds 
to several copies of the channel and letting the number of 
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copies go to infinity. 

Appendix A 

A MINIMAX THEOREM 

Let X and Y be non-empty sets and / : X y. Y ^ 
M := M U {— 00, +00} be a function. Obviously, for any 

xq eX and ?/o e F we have mi^^x f{x.,yo) < f{xo,yo) < 
suPyeY f{xo,y) and hence, 

sup inf f{x,y) < inf sup/(x, y). (39) 

Minimax theorems give sufficient conditions on when the 
above inequality holds with equality. The following Lemma 
A.l is a step in the proof of Sion's minimax theorem in [30], 
the proof of which we include for the readers' convenience. 
We will use the notation [/(. , y) < c] to denote the level set 
{x ^ X : f{x, y) < c} for some number c G R, and other 
level sets are denoted similarly. 

Lemma A.l. Assutne that X is a compact topological space 
and /(. , y) is lower semicontinuous for every y € Y. Assutne, 
moreover, that for any finite subset Y' dY we have 

inf max/(x,y) < sup inf f{x,y). (40) 

xeX yeY' y^Y 

Then the infima in (39) can be replaced with minima, and 
sup min /(a;, y) — min sup /(x, y). 

y^Y '^<^^ '^<^X y£Y 

Proof: The lower semi-continuity of f{.,y),y G Y 
implies the lower semi-continuity of sup^, /(. , y) and, since X 
is compact, all the functions /(. ,y), y G Y, and sup^ /(. , y) 
reach their infima on X. Hence, we can replace the infima 
with minima. 

To prove the main assertion, we have to show that 

min sup f{x, y) < sup min/(a;, y). 

x^X y^Y y^y ^^-^ 

Let c < miuajgA' supj,gy /(x, y) or equivalently, let c be 
such that riyeyl/l-j?/) < c] = 0. Lower semicontinuity 
of f{.,y) yields that [f{.,y) < c] is closed (and hence 
compact) for every y E Y and hence, there exist finitely many 
yi,. .. ,yr such that n[=i[/(- iVi) < c] = or equivalently, 
c < miuj-gx niaxi<i<r f{x, yi). By the assumption (40), we 
obtain c < sup^^g-^ min^-gx f{x,y)- Since this holds for any 
c < miuj-gx supj,gy f{x, y), the assertion follows. ■ 

Corollary A.l. Let X be a compact topological space, Y be 
a subset of the real line and let f : X xY 'M. be a function. 
Assume that 

(i) /(. , y) is lower semicontinuous for every y GY and 

(ii) f{x, .) is monotonic increasing for every x £ X, or 
f{x, .) is monotonic decreasing for every x E X. 

Then the infima in (39) can be replaced with minima, and 

sup min f{x,y) = min sup f{x,y). 

yf^Y^^X xex y^Y 

Proof: By the monotonicity assumption, for any finite 
subset Y' = C Y, there exists a y* G 

{yi, . . . , y,,} such that 

max f{x,yi) = f{x,y*) 

l<?<r 



for all X € X. Hence, 

min max f{x,yi) = min f{x,y*) < sup min/(2;,y). 

x£X l<i<r x£X y^Y 

Thus, all the conditions of Lemma A.l are satisfied, from 
which the assertion follows. ■ 

Appendix B 
The limit of the o-capacities 

In this Appendix we collect some properties of the quantities 
Xa and Xa that are needed for the proof of Theorem V.2. To 
simplify notation, we introduce 

where W : X ^ S{T-C) is a fixed classical-quantum channel. 
We start with the following: 

Lemma B.l. Assume that a > 1. Then, for any pi,P2 S 
Mf{X), T] G (0, 1) and a G S[U), 

Sa (e^i-^)p^+^p,W \\ ((1 - 77)pi + r]p2) ® cr) (41) 

>{l-V)Sa (lEpi 1 1 Pi ® (t) + ?7^„ (fip, W\\p2®<j) 

(42) 

>ii~v)xApi) + vxAP2)- (43) 

In particular, the function p Xa{p) concave on A4f{X). 

Proof The inequality in (43) is obvious from (34). One 
can easily verify that the expression in (41) is equal to 
+00 if and only if the expression in (42) is equal to +00, 
and otherwise the inequality between the two follows by a 
straightforward computation from the concavity of the function 
^j3Y log. The last assertion follows by taking the infimum in 
a in the inequality between (41) and (43). ■ 

The following statement is essentially Lemma 2 from [31]: 

Proposition B.2. Assume that ranM^ is compact and a > 1. 
Then 

Proof: Using the concavity established in Lemma B.l, 
one can follow the proof of Lemma 2 in [31] to obtain the 
assertion. (Note that in [31], X was assumed to be finite, but 
that doesn't make a difference in the proof.) ■ 

Let m := (diniH)^ + 1, and let 

Mmi^-anW) := {p G A^/(ranM^) : |suppp| < m} 

denote the set of probability measures supported on not 
more than m points in ranVF. By Caratheodory's theorem 
[41, Theorem (2.3)], for every p G Aif{X), there exists a 
p G Mm{ra,nW) such that 

j_ 

xAp) = XaiP) ■■= ^\ogTv ( ^ p{u;)uA". 

Note that x can also be defined by replacing X with ranVF 
and W with the identity map id on ranVF in (35), i.e., for 
each p <E Mf (ran W), 

Xa {p) = inf Sa (lEpid \\p®a] . (44) 
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The functions xi ™d Xi defined simply by replacing a 
with 1 in (35) and in (44), respectively. Note that 

XL(^)= Xa(p)= sup Xaip) 

p£Mf{X) p6A1m(ranlV) 

for every a G [0, +oo). 

Lemma B.3. The functions a Xq (p) '^fnc/ ct ^ Xa (p) ore 
montonically increasing on [0, +cxd) for all p £ Ai f{X) and 
p (z Aif (ran W), respectively, and 

lim Xa (p) =Xi ip) , lim Xa (p) = Xi (p) ■ 

Proof: The assertion on the monotonicity follows imme- 
diately from the monotoncity of the Renyi relative entropies in 
the parameter a. We prove the assertion on the limit separately 
for a 1 and for a \ 1. In the second case, we have 

lim Xaip) = inf Xaip) = inf inf Sa mpW\\p^ a 

a\l a>l a>laeS{n) V 

= inf ini Sa (^pW \\p® a 

= inf s(EpW\\p(E)a) =xi{p)- 

For fixed p g Aif{X) and a G [0, +oo), the map 
(7 ^ Sa (^pW II p (X) (T + el^ is continuous on the compact 

set S{T-l) and hence the map a t-^ Sa (EpW \ \p® cr^ is lower 
semicontinuous, due to (13). On the other hand, for fixed 
a G 5(7^), the map a Sa (EpW \ \p^ is monotonic 
increasing in a and hence, by Corollary A. 2, we have 

limx^(p) = sup inf Sa (EpW \\p a 



inf 

<TG5(H)a< 



sup Sa ( 

Q<1 ^ 



W\\pi 



inf S{E„W\\p<E)cr 

cr£S{H} ^ 



Xlip)- 



The proof for linia^i Xa{p) goes exactly the same way. ■ 

The following Lemma was shown in [42]. For readers' 
conveniance, we include a proof here. 

Lemma B.4. Ifra.nW is compact then A^m(ranVF) can be 
equipped with a topology r with respect to which (ran W) 
is compact and Xa is continuous. 

Proof Let Sm {(Ai,...,Am) : Ai,...,Am > 
0, X]"=i " 1} denote the m-dimensional probability 
simplex, and define U,ra{W) ■— Sm x (ranVF)™ = 
{(Ai^) • A G Sm, uji, . . . ,LL!m G ranVt^}. Compactness of 
ran W yields that ilm(VF) is compact with respect to its natu- 
ral topology. Let 7r,„ : flmiW) A^„i(ranM^), 7rm(A,aj) 
Y^^iK^uji, where S^^ denotes the Dirac measure concen- 
trated at uJi. We define the topology t on A^„i(ranVF) to 
be the factor topology, i.e., the finest topology with respect 
to which TTm is continuous. Being the continuous image of 
a compact set, Al,„(ranM^) is also compact. One can easily 
see that Xa ° "^m is continuous on il.m{W), which in turn is 
equivalent to the continuity of Xa with respect to r. ■ 



The following statement was shown in Lemma 3 of [31] for 
the case where X is finite. Here we give an alternative proof, 
using the minimax theorem established in Appendix A, that 
covers the general case. 



Proposition B.5. 



limx*sjW)=X*s{W). 



Proof We prove separately the cases a I and a \ 
1. In the first case, the assertion follows immediately from 
Lemma B.3, as 

limXs„(W^)= sup sup Xa{p) 
"/^i ae[OA)peMf{x) 

= sup sup Xaip) 

peMfiX) qg[04) 

= sup xi{p) = X*s{W). 

peMfiX) 

Note that the function f{p,a) := —Xa{p) is monotonic 
decreasing in its second variable on Y := (1,+cxd) and 
continuous in its first variable on the compact space X := 
A4„i(i'an VK), due to Lemma B.4. Hence, we can apply the 
minimiax theorem of Corollary A. 2 to obtain 

lim Xs„(^) = inf ,1^^^ ,,,,^°'^P^ 

a\l a>l p£Mm{ranW) 

~ max inf Xaip) 

peA^m(ran W) a>l 

= max Xlip) = Xs(W^)- 

peA^m(ran W) 
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